Detecting Incorrect Numerical Data in DBpedia
نویسندگان
چکیده
DBpedia is a central hub of Linked Open Data (LOD). Being based on crowd-sourced contents and heuristic extraction methods, it is not free of errors. In this paper, we study the application of unsupervised numerical outlier detection methods to DBpedia, using Interquantile Range (IQR), Kernel Density Estimation (KDE), and various dispersion estimators, combined with different semantic grouping methods. Our approach reaches 87% precision, and has lead to the identification of 11 systematic errors in the DBpedia extraction framework.
منابع مشابه
Detecting hidden errors in an ontology using contextual knowledge
Due to modeling errors in designing ontologies, an ontology may carry incorrect information. Ontology debugging can be helpful in detecting errors in ontologies that are increasing in size and expressiveness day by day. While current ontology debugging methods can detect logical errors (incoherences and inconsistencies), they are incapable of detecting hidden modeling errors in coherent and con...
متن کاملCorrecting Range Violation Errors in DBpedia
A range violation error is a problem when an object of a knowledge graph triple does not have a type required by the range of the triple’s predicate. This paper aims to correct these erroneous triples in DBpedia by finding correct objects with the required type to replace the incorrect objects. Our approach is based on graph analysis and keyword matching. It also exploits information from the i...
متن کاملDetecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection
Outlier detection used for identifying wrong values in data is typically applied to single datasets to search them for values of unexpected behavior. In this work, we instead propose an approach which combines the outcomes of two independent outlier detection runs to get a more reliable result and to also prevent problems arising from natural outliers which are exceptional values in the dataset...
متن کاملA comparison of complex correspondence detection techniques
One to one correspondences between entities are not always sufficient to describe the true relationship between related entities in diverse ontologies, and complex correspondences are needed instead. We demonstrate the types of complex correspondence occurring between two LOD sources and compare techniques for discovering these complex correspondences. 1 Motivation and Background Most alignment...
متن کاملWhoKnows? Evaluating linked data heuristics with a quiz that cleans up DBpedia
Semantic technologies enable sophisticated search scenarios on educational video content. Linking Open Data (LOD) provides a vast amount of well structured semantic information in heterogenous domains. But, despite of the syntactically well expressed RDF facts, when authoring and publishing LOD, many inconsistencies may occur, especially if the data is generated with the help of automated metho...
متن کامل